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Abstract 



J> ■ We present a method for total energy minimizations and molecular dy- 

namics simulations based either on tight-binding or on Kohn-Sham hamilto- 
■ nians. The method leads to an algorithm whose computational cost scales 

ON ■ linearly with the system size. The key features of our approach are (i) an 



orbital formulation with single particle wavefunctions constrained to be lo- 



a 

■ calized in given regions of space, and (ii) an energy functional which does 

not require either explicit orthogonalization of the electronic orbitals, or in- 



version of an overlap matrix. The foundations and accuracy of the approach 
and the performances of the algorithm are discussed, and illustrated with sev- 
eral numerical examples including Kohn-Sham hamiltonians. In particular we 
present calculations with tight-binding hamiltonians for diamond, graphite, 
a carbon linear chain and liquid carbon at low pressure. Even for a complex 
case such as liquid carbon - a disordered metallic system with differently co- 
ordinated atoms - the agreement between standard diagonalization schemes 
and our approach is very good. Our results establish the accuracy and relia- 
bility of the method for a wide class of systems and show that tight binding 
molecular dynamics simulations with a few thousand atoms are feasible on 
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small workstations. 
71.10.+X, 71.20.Ad. 
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I. INTRODUCTION 



Many studies of materials carried out nowadays in condensed matter physics are based 
on total energy calculations and molecular dynamics simulations with forces derived either 
from first principles (FP) or tight binding (TB) hamiltoniansS. These computations rely on 
a single particle orbital formulation of the electronic problem. Within such a framework, the 
calculation of the total energy amounts to the solution of a set of eigenvalue equations (e.g. 
the Kohn-Sham equations, in Density Functional Theory), which is obtained by diagonalizing 
the hamiltonian matrix (H). H is usually set up according to a chosen basis set for the 
electronic orbitals. Both direct and iterative diagonalizations imply an overall scaling of the 
computational effort which grows as the third power of the number of electronic states, and 
thus as the cube of the number of atoms in the systems. This unfavorable scaling is a major 
limitation to the use of TB and FP hamiltonians for systems containing more than a few 
hundred and a few thousand electrons, respectively. 

Iterative diagonalizations have been utilized in the study of a variety of systems in 
recent years; indeed when the number M of basis functions is much larger than the number 
N of electronic states these schemes are much more efficient than direct diagonalizations. 
There are two types of iterative approaches: constrained minimization (CM) method! in 
which the single particle wavefunctions are required to be orthonormal and unconstrained 
(UM) methodsiJl, in which the orbitals are allowed to overlap. In computations with plane 
wave (PW) basis sets and pseudopotentials -which are the ones most widely used in, e.g., 
first principles molecular dynamics simulations!- the evaluation of {H(f)}, i.e. of Hcf) for 
the N electronic states, requires O(NM) operations (M is proportional to N). This is so 
if advantage is taken of fast Fourier transform techniques and of the localized nature of 
non local pseudopotentials. The application of orthogonality constraints implies instead 
0(N 2 M) operations. When UM are used, the calculation of the overlap matrix (S) and of 
its inverse are of 0(N 2 M) and 0(iV 3 ), respectively. 

Recently several groups have proposed methods to overcome the problem of the so called 
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iV 3 scaling, and algorithms with linear system-size scaling^ These approaches are 

usually referred to as O(N) methods. Some of them are based on an orbital formulation 
of the electronic problem&BiSEl, whereas others rely on formulations without single par- 
ticle wavef unctions, but based on the direct calculation either of the one electron Green 
functioni'0 or of the density matrixSS. 

A key idea! of O(N) orbital schemes is to use wavefunctions forced to be localized in 
given regions of space. These regions are to be chosen appropriately, i.e. large enough so 
that the effect of localization constraints be made negligible on the computed properties. 
The solution of the eigenvalue problem by searching directly the eigenstates is therefore 
abandoned in favor of a search for a linear combination of eigenstates which is localized 
in real space. In this way the total number of expansion coefficients used to represent 
the localized electronic orbitals depends linearly on the size of the system and the number 
of operations needed for the evaluation of {H<p} can be reduced^ to O(N). The idea of 
working with localized wavefunctions is directly related to that of taking advantage of the 
local nature of the density matrix (p) in real spacei0, by considering the elements p,ij to be 
zero for distances larger than an appropriate cutoff (localization) radius. 

In order to reduce to O(N) operations not only the calculation of {H<p} but also iterative 
orthogonalization procedures or the S inversion, one should in principle resort to assumptions 
on the form of the overlap matrix. If the off diagonal elements of S can be made appropriately 
small, with respect to its diagonal elements, then the matrix can be inverted with an iterative 
procedure whose number of iterations does not increase with the size of the system, and 
which therefore implies a number of operations scaling linearly with system size. However 
the problem of imposing explicit orthogonalization constraints or of inverting S can be 
solved without any assumption or approximation. One can define a functional with implicit 
orthogonalization constraints, containing only the S matrix but not its inverse, in such a 
way that it has exactly the same minimum as the Kohn-Sham density functional. One can 
therefore use a functional which is "easier" both to evaluate and to minimize than those 
used in standard CM and UM, which nevertheless has the "correct" ground state energy and 



charge density. This is another key idea of the O(N) orbital scheme which was introduced 
in Ref .1. 

Such an approach will be presented in detail in section II of this paper. In section III 
we discuss numerical results obtained for first principles calculations within density func- 
tional theory, in the local density approximation. In section IV we demonstrate that an 
algorithm with linear system-size scaling can be obtained when the functional with implicit 
orthogonalization constraints is minimized with respect to localized orbitals. Section V and 
VI contain our results for the minimization of tight binding hamiltonians and for molecular 
dynamics simulations, respectively. Summary and conclusions are given in section VI. 

II. AN ENERGY FUNCTIONAL WITH IMPLICIT ORTHOGONALIZATION 

CONSTRAINTS 

A. Definition and characterization of the energy functional 

The key points of the unconstrained minimization method introduced in Ref. i are: (i) 
the replacement of the inverse of the overlap matrix, entering the energy functional used in 
standard UM methods, with its series expansion in (I — S) up to an odd order Af, where I 
is the identity matrix; (ii) the implicit inclusion of orthonormality constraints in the energy 
functional, at variance with standard CM methods, where orthonormality constraints are 
treated explicitly, i.e. as Lagrange multipliers. After defining the novel energy functional 
which satisfies properties (i) and (ii), we will prove that: (1) this energy functional has the 
Kohn-Sham ground state energy (Eq) as its absolute minimum and (2) its minimization 
yields orthonormal orbitals. 

We consider an energy functional of N/2 overlapping orbitals {(f)} expanded in a finite 
basis set, and of the (N/2 x N/2) matrix A: 

N/2 . 

E[A, {0}] = 2(£ Mi < U ~ g V% > +F[p}) +v(N-J dvp(v)) (1) 

ij 
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where p(r) = p[A, {</>}] (r) = 2 Y^j 2 ^kj<Aj( r )<M r )) F[p] is the sum of the Hartree, exchange- 
correlation and external potential energy functionals and rj a constant to be specified. The 
factor 2 accounts for the electronic occupation numbers, which are assumed to be all equal. 
For simplicity we consider real orbitals. According to the choice of the matrix A, one 
can obtain either the functional used in standard UM methods or the energy functional 
which we introduced in Ref.i. If A = S^ 1 , where SV,- =< 4>i\4>j >, then p[S -1 ] is the 
single particle charge density p(r) and the term multiplying rj is zero; in this case the 
functional of Eq. |l] is the total energy of interacting electrons in an external field according to 
DFT, written for overlapping orbitals!!!. In particular, if the wavefunctions are orthonormal 
then A = S _1 = I, and Eq. |l| gives the total energy functional of DFT used in CM and 
ab-initio molecular dynamics (MD) simulations^. We indicate with {ifj} and {<f>} sets of 
orthonormal and overlapping orbitals respectively, and with £ lJ -[{^}] the energy functional 
of CM procedures. The sets {ip} and {<f>} are related by the Lowdin transformational 
A = Ej ^ V2 0j and then E^fS" 1 ^] = E[S~\ {0}]. Therefore 

mintfHM] = ™*E[8-\ M] = E o- (2) 

M W 

The energy functional of {<f>}, E[Q [{(f)}], {</>}], which we introduced in Ref.i is obtained 
by taking A = Q where 

Q = f](I-Sr (3) 

n=0 

and M is odd. Q is the the truncated series expansion of S _1 . We note that similarly to E L 
and £?[S -1 ], E[Q] is invariant under unitary transformations in the subspace of occupied 
states, i.e. under the transformation = Ylf^ 2 Uij<f>j, where U is a (N/2 x N/2) unitary 
matrix. 

We now prove that the absolute minimum of E[Q, {(f)}] is E . If the orbitals are orthonor- 
mal, i.e. Sij = 6ij, then Qij = 5ij and E[Q, {ip}] coincides with -*-[{■?/>}]: min^} i? ± [{^}] = 
min{^} E[Q, {if)}]. Furthermore, since {ip} is a subset of {<f>}, min^} E[Q, {ip}] > 
min^} E[Q, {4>}]. As a consequence 
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rmnE 1 ^}}^ mm E[Q, {</>}}. (4) 

m {<t>} 

This shows that E , is an upper bound to min^} E[Q, {(ft}}. 

We consider the difference between the functionals E[Q, {0}] and _E[S _1 , {</>}] i.e. 

A £ = g [Q, W ]- g [S-', W] = | a£|A W- W ' dA (5) 

where A(A) = A(Q - S" 1 ) + S" 1 Using Eq. (0), Eq. <g) becomes: 

N/2 

AE = 2J2< HH KS - V \4>i > (Qij - S-/) (6) 

where H KS = -±V 2 + V KS , with V KS = £ d\V KS [p{\)\ and V KS [p] = s -§ . H KS is a 
Kohn-Sham hamiltonian, where the self-consistent potential is averaged over the integration 
path (A) of Eq. (|5]). Given a finite basis set for the orbitals {0}, one can choose 77 large 
enough so that the operator (Hxs~v) ls negative definite; then also the (N/2 x N/2) matrix 
< <pj\H Ks — ^^i > is negative definite. Using the expression of the sum of a geometric series 
for Q, we can express the difference between Q and S _1 as (Q — S" 1 ) = — S _1 (I — S)^ 1 = 
— (I — S) A ^ +1 S~ 1 . If M is odd, the difference between Q and S" 1 results to be a non positive 
definite matrix since S, S -1 and (I — S) Ar+1 are commuting non negative definite matrices. 
Therefore if rj and M fulfill the above requirements, AE is non negative since it is equal to the 
trace of the product of a negative and of a non positive definite matrix. As a consequence, 
for each set of {0} 

E[Q, {</>}] >E[S-\ {</>}]. (7) 

The equality holds only if (Q — S) _1 is equal to zero and therefore only if S = I. Eq.(^) 
shows that E is a lower bound to min^} E[Q, {0}]. From Eqs. [| |] and [7] we have 

mmE x [{^}} = min£[Q, {0}] = min^S" 1 , {0}] = E . (8) 
{V>} {</>} {</>} 

This proves that the energy functional E[Q] has the Kohn-Sham ground state energy (Eq) 
as its absolute minimum, if rj and M fulfill the requirements discussed above. 
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We showed that E[Q] and -EfS -1 ] are equal only if the orbitals are orthonormal [Eq. ((7|)] 
and that at the minimum the two functionals are equal [Eq. (§)]. It then follows that the 
minimization of E[Q] yields orthonormal orbitals. 

The choice of rj which makes (Hks — v) negative definite deserves some comments. If the 
hamiltonian of the system does not depend on p, a rj larger than the hamiltonian maximum 
eigenvalue e max insures that AE > 0. Within LDA, one can prove that if^s[p[Q]] < H H [p], 
where Hh[p] = [— |V 2 + Vh[p] + V ext \, Vr and V ext are the Hartree and external potential, 
respectively, and p = p[S -1 ]. This follows from the property p[Q](r) < p(r), valid for each 
point r, and from the explicit LDA expression of the exchange and correlation energy as a 
function of p(r). Within, e.g., a plane wave implementation with a finite cutoff, Hh has 
an upper bound. This insures the existence of rj such that AE > 0. However in practical 
implementations one can choose rj smaller than the upper bound of Hh', indeed for practical 
purposes it is not necessary to require Eq to be the absolute minimum of -E[Q], but it is 
sufficient to require it to be a local minimum of -E[Q]; the constant rj which fulfills this 
weaker condition is in general much smaller that the upper bound of Hh, as we will discuss 
in the next section. 



B. Iterative minimization of the energy functional 

In this section we discuss the choices of rj appropriate in practical applications and the 
convergence rate of iterative minimizations of E[Q] with J\f = 1, compared to that of E L . 
For non self consistent hamiltonians, we will show that if 77 is larger than the Fermi energy, 
then Eq is a local minimum of J5[Q]; furthermore if a value of rj is chosen, which is close 
to the Fermi energy, the minimum of E[Q] and that of E x can be obtained with the same 
computational efficiency. 

The asymptotic convergence rate of iterative minimizations of a functional i?[{0}] can 
be estimated by expanding it around its minimum E , up to second order in the variation of 
the wavefunctions {0}. As discussed, e.g., in Ref.0, in the minimization asymptotic regime 
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the number of integration steps to reach convergence is directly related to the ratio between 
the maximum and the minimum eigenvalues of the quadratic form which results from the 
second order expansion of (E — Eq). 

We consider a non self-consistent hamiltonian (H) and we relate its eigenvalues ({e}) 
to those of the quadratic expansion of (^[Q] — E ). Since E[Q] is invariant under unitary- 
transformations in the subspace of occupied states, a generic variation of the wavefunction 
with respect to the ground state can be written as : 

\<f>i >= \Xi > +|A* >, (9) 

with 

|A,>= £ cflx? > ■ (10) 

l€[TOT] 

Here \xi > are the eigenstates of H, and the indeces i and I belong to the set of occupied 
states and to the set of occupied plus empty states, respectively. We denote with [OCC] and 
\EMV\ the sets of occupied and empty states, and with [TOT] the union of the two sets. c\ 
are expansion coefficients of | Aj > over the eigenstates of H. If Eq. |9] is substituted into the 
expression of AE = E[Q] — Eq, the first order term vanishes, showing that for each value of 
i] the orbitals \Xi > make E[Q] stationary. The stationary point is in particular an absolute 
minimum if rj > e max , as shown in the previous section. One is then left with a second order 
term, which can be recast as follows: 

A£= E E 2[ em -e t ](c4) 2 + £ 8[ V - { ^±^][^=(c) + 4)? (11) 

m£[£MV] i£[OCC] ije[OCC] ^ l 

From Eq. |11] it is seen that the quadratic form AE has two sets of normal modes. The first 
set has eigenvalues k^mi) = 2[e m — e»], which are always positive and independent of ^; they 
correspond to the coordinates d m . These modes are associated with an increase of the total 
energy when the orbitals acquire non zero components on empty eigenstates of H. They are 
the same as the normal modes of (E 1 - — E ), calculated in Ref.0. The second set of normal 
modes of AE has eigenvalues ku^ = 8[r] — t ]; they correspond to the coordinates 
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[■j= (c*- + c-)]. These modes are associated with a change of E[Q] due to the overlap of 
the electronic wavefunctions; they are indeed associated with the orthogonality constraints 
implicitly included in the definition of E[Q\ and they are not present in (E 1 - — Eq). 

For i] larger than the highest occupied eigenvalue of H, (i-e. the Fermi energy), the 
k(ij) are positive and thus Eq is a local minimum of i?[Q]. rj > €n/2 is a weaker condition 
than the one required to prove Eq. (§); it is however a sufficient condition to insure that the 
minimization of E[Q] leads to the correct ground state energy, provided a reasonable starting 
point for the minimization is chosen. This will be shown also with numerical examples in 
the next section. 

The minimizations of E[Q] and E L can be obtained with the same efficiency provided 
the weaker condition on r\ is adopted. For example one can choose r\ ~ ejv/2+i- Under such 
a condition the ratio between the maximum and the minimum eigenvalues of the expansion 
of (-E[Q] — E ) and of (E 1 - — E Q ) is the same in most systems. Indeed the eigenvalues 
k lie in the interval defined by the eigenvalues k, if the spread in energy of the excited 
states of H is four times smaller than the valence band width. This condition is satisfied in 
most systems of interest. This means that in practice iterative minimizations of E[Q] and 
MD simulations with E[Q\ can be performed with the same efficiency as the corresponding 
calculations with E L . However, if rj is chosen so that Eq is an absolute minimum of -E[Q], 
the time step used in MD simulations, which is proportionaflll to the square root of the 
maximum eigenvalue of AE (Eq. [TIP , is reduced by a factor of two with respect to that used 
in standard calculations. 

The functional introduced in II. A has clear advantages over standard energy functionals 
when conjugate and preconditioned conjugate gradient minimization procedures are used: 
the complication of imposing orthonormality constraints is avoided, and contrary to ordi- 
nary unconstrained methods an automatic control of the S matrix is provided, since at the 
minimum S = I. Furthermore, when preconditioning of the high frequency components of 
the single particle wavefunctions is introduced, e.g. in Car-Parrinello molecular dynamics 
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simulations, the integration of the electronic equation of motion does not imply any extra 
work, at variance with integration schemes with explicit orthogonalization constraints^!. 



C. Relationship with other functionals 

The total energy minimization scheme which we introduced in Ref.i is related to other 
approaches recently proposed in the literature for electronic structure calculations with linear 
system-size scaling. In particular Ordejon et al.0 derived the same functional as that of 
Eqs. 1 and 3 for M = 1 for non self-consistent hamiltonians. Their derivation is based on 
a Lagrangian formulation with explicit orthogonalization constraints, where the Lagrange 
multipliers (Ay) are approximated by an expression which is exact only at the minimum, i.e. 
Xij =< (j>i\H\(f)j >. The approach presented by Ordejon et al.El is similar to that of Wang 
and Teterl, although in Ref.0 constraints are introduced by means of a penalty function. 
However the minimum of the Wang and Teter functional is E only if the weight of the 
penalty function goes to infinity, at variance with our and Ordejon et al.'s functionals which 
at the minimum is always equal to Eq. 

Instead of using an orbital formulation, Li, Numes and Vanderbilti and Daw0 proposed 
a functional for total energy minimizations within a density matrix formulation. In this 
case one minimizes the energy functional with respect to the density matrix, which must 
fulfill the idempotency condition. This condition is enforced by minimizing the total energy 
with respect to a purified version of the density matrixilll (p(r, r')), constructed from a 
trial density matrix p(r,r') in such a way that its eigenvalues lie on the interval [0,1]. The 
energy functional E[Q] (Eq. 1,3) for non self-consistent hamiltonians can be re-derived 
within the formulation of Ref.i if p(r, r') is expressed in terms of the occupied single particle 
wavefunctions, i.e. p(r, r') = J2i & [occ] <M r )<M r/ )) an d a purification transformation is chosen 
such that p = I — {I — p)-^ +1 . This transformation forces the eigenvalues of p to be less than 1 
only if M is odd; one does not need to force the eigenvalues to be positive, as done in Ref.i, 
since by construction p{r,r') = J2ie[occ] 0i( r )<M r ') nas a number of non zero eigenvalues 
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equal to the number of occupied states. 



III. NUMERICAL RESULTS OF FIRST PRINCIPLES CALCULATIONS 

The validity of the minimization scheme presented in Section II was tested numerically for 
KS hamiltonians within LDA, by computing the ground state energy of Si in the diamond 
structure. We used an expansion coefficient M = 1 to define the Q matrix entering the 
energy functional (see Eq. 3). We chose rj smaller that the maximum eigenvalue of H^s'-, 
this choice insures the iterative minimization to properly converge to the ground state energy 
Eq, unless a pathological starting point for the electronic orbitals is chosen. 

E[Q] was minimized by steepest descent; the derivative of the functional with respect to 
the single particle orbitals is given by: 

= AJ2[(H KS - fj)\(j>j > (2fy - Sfl) - fa >< <P 3 \{H KS - 77)10, >] (12) 

CHpi j 

The orbitals were expanded in PW with a kinetic energy cutoff (E cut ) of 12 Ry and the 
interaction between ionic cores and valence electrons was described by a norm conserving 
pseudopotentialEl expressed in a separable form0. The calculation was started from orbitals 
set up from random numbers, with rj set at 3.0 Ry above the top of the valence band. In 
Fig. 1 we report E L and E[Q] as a function of the number of iterations; it is seen that 
the minimizations of the two functionals require the same number of iterations and leads to 
the same energy. Fig. 2 shows the integral of the charge density during the minimization 
procedure. For J\f — 1, AiV = N — J drp(r) = N — Tr (QS) is given by 

AiV = Tr((I — S) 2 ). (13) 

This is a positive quantity which goes to zero as the orbitals become orthonormal. In our 
calculation the difference AiV between the total number of electrons and the integrated 
charge reaches a value very close to zero (~ 10~ 6 ) after 10 iterations, showing that the 
single particle wavefunctions are orthonormal already well before reaching the minimum. 
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IV. LOCALIZED ORBITALS AND AN ALGORITHM WITH LINEAR 

SYSTEM-SIZE SCALING 



We now turn to the discussion of the approach introduced in section II within a localized 
orbital (LO) formulationi. Within such a formulation, each single particle wavefunction 
is constrained to be localized in an appropriate region of space, which we call localization 
region (LR): the electronic orbitals are free to vary inside and are zero outside the LR. 
Different single particle orbitals can be associated to the same LR, e.g. two doubly occupied 
orbitals per LR for C and Si, which have four valence electrons. The extension of a LR is 
determined by the bonding properties of the atomic species composing the system, and it is 
the same unrespective of the size of the system which is simulated. The choice of the centers 
of the LRs is arbitrary. In all of our calculations (see next section) we centered the LRs on 
atomic sites; this choice is physically unbiased, i.e. it can be adopted for a generic system 
whose bonding properties are totally unknown, and allows for a solution which satisfies 
charge neutrality conditions. If one wants to take advantage of known properties of the 
system, LRs can for example be centered on atomic bonds or on positions compatible with 
the symmetry of the Wannier functions, if these can be defined. This is however difficult to 
do, e.g. at each step of a MD simulation, where the evolution of the bonding properties as 
a function of time is not known. One could also treat the centers of the LRs as variational 
parameters and optimize their locations during the calculation. 

We now consider the minimization of E[Q] with respect to LO {{4> L })- When the orbitals 
are localized, Sy, and < (p^lHxsl^j > are sparse matrices which have non zero elements only 
if i and j belong to overlapping LRs. The evaluation of E[Q] (Eqs. 1,3) as well as of 9 q^l 1 
(Eq. 12) implies only the calculation of matrix products containing SV, and < <j)f \Hxs\ ( t >1 j >• 
No orthogonalization or S inversion is needed. Thus at each step, the minimization of E[Q] 
can be performed with a number of operations which is proportional to the system size. 

When localization constraints are imposed, the variational freedom of the minimization 
procedure is reduced. The energy obtained by minimizing a functional with respect to 
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LOs is then larger than the absolute minimum (Eq) obtained with no constraints on the 
single particle wavef unctions. In particular, the minimum of E[Q] with respect to LO {<j) L } 
does not coincide with that of E'fS -1 ], and the LO which minimize E[Q\ are in general not 
orthonormal. This is easily seen as follows. Whereas Eq. ^ and [7| hold also for LO, Eq. ^] is 
no longer valid when localization constraints are imposed. Indeed the transformation from 
{i/>} to {0} with S™ 1 / 2 does not preserve the size of the LR, i.e. it does not map functions 
localized in a given region onto functions localized in the same region of space. Therefore 
Eq. H does not hold but is replaced by 

minE 1 > minMQJ > minMS _1 l > E , (14) 

H L } W L } W L } 

where the LR for the {ip L } and {4> L } are the same. Since in Eq. ([14] ) the equality is in 
general not satisfied, at the minimum S is different from I, contrary to the case of extended 
orbitals. 

The variational quality of the results obtained by minimizing -E[Q], i.e. the difference 
[minj^L} E[Q] — E ], depends upon (i) the order Af chosen for the definition of the Q matrix 
and (ii) the size of the LR. For S < 21, it is easy to see that E[Q(Af - 2)] > E[Q(Af)]. 
Therefore by increasing Af in the definition of Q, one obtains an improvement of the total 
energy. This leads as well to an increase of the number of operations needed in the com- 
putation of Q (see Eq. 3). Most importantly, in order to improve the quality of the results 
one can choose to increase the size of the localization region. We note that the number of 
non zero elements of S is proportional to n^N ', where is the average number of regions 
overlapping with a given one. Instead the number of degrees of freedom needed to define 
the N/2 single particle orbitals is proportional to mN, where m is the number of points 
belonging to a LR, e.g. the number of points where the wavefunction is non zero. The ratio 
n Ln/ m strongly depends on the basis set chosen to set up the hamiltonian. The optimal 
choice of Af and of size of the LRs, e.g. of the parameters determining the efficiency and 
accuracy of the computation, crucially depends upon the chosen basis set. 

In calculations where m ^> ulr, the computer time for the S inversion amounts to a 
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small fraction of the total time also for relatively large systems (e.g. systems with up to a few 
thousand electrons in LDA calculations with PW basis). On the contrary for computations 
with small basis sets, such as those with TB hamiltonians, the computer time for the S 
inversion constitutes a considerable part of the total time already for small systems (i.e. 
containing a few tens of atoms). 

V. MINIMIZATION OF TB HAMILTONIANS 

The LO formulation was tested numerically using TB hamiltoniansilli! with the conven- 
tion e s + e p = 0. We performed calculations for Si and C in different aggregation states. In 
calculations for crystalline structures, we considered non zero hopping terms only between 
first neighbors. We chose a number of LRs equal to the number of atoms and we centered 
each LR at an atomic site (J). In a TB picture a LR can be identified with the set of atoms 
belonging to it. For each site /, we label the set of atoms which belong to a LR with LRj. C 
and Si atoms have four valence electrons and there are two doubly occupied states for each 
atom in the system. We then associated two states to each LR: The two wavefunctions of 
the LR centered in / have non zero components on the atoms belonging to the set LRj and 
zero components (expansion coefficients) on the atoms which do not belong to LRj. The 
expansion coefficients of the single particle orbitals are treated as variational parameters in 
our calculations. The total number of expansion coefficients grows linearly with the size of 
the system. 

We tested two different shapes of the LR. In one case an atom is defined as belonging 
to LRj if its distance to the site / is less than or equal to a given radius r c (in other words, 
an euclidean metric is used to define the shape of the LR). In the second case, we took 
advantage of the form the TB hamiltonian and we considered an atom as belonging to LRj 
if it is connected to the site / by a number of non zero hopping terms less than or equal to 
a given number of shells Nh- 

In all calculations E[Q] was minimized with respect to 0^ by a conjugate gradient (CG) 
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procedure. The gradients d jfi ' are simply obtained by projecting Eq. (12) onto the LR 
where 0f is defined. For non self-consistent hamiltonians, the line minimization required 
in a CG procedure reduces to the minimization of a quartic polynomial in the variation of 
the wavefunction, along the conjugate direction. In our calculation the line minimization is 
performed exactly by evaluating the coefficients of the quartic polynomial. 

We found that when localization constraints are imposed, E[Q] can have local min- 
ima and metastable states, where the system may be trapped for a long time during the 
minimization procedure, before reaching a minimum. This problem can be overcome if an 
appropriate choice of the initial guess for the iterative diagonalization is made. In all of our 
calculations we used starting wavefunctions with non zero components only on the site / 
where they were centered; furthermore orbital components were the same for each /. This 
choice allowed to avoid local minima and metastable state traps for a wide class of ionic 
configurations. The problem of being trapped in metastable states or local minima involves 
only electronic minimizations; it does not concern MD simulations, where the ground state 
orbitals of a given step can be used as guess wavefunctions for the following step. 

Fig. 3 shows the percentage error on the cohesive energy E c of Si in the diamond structure, 
as a function of the size of the LR, computed with respect to a calculation perfomed with 
extended orbitals. All computations were carried out with 216 atom supercells, simple cubic 
periodic boundary conditions and the T point only for the supercell Brillouin zone (BZ) 
sampling. E c was evaluated with Q[jV = 1] and Q[Af — 3] and with rj = 3 eV. The shape of 
the LR was first chosen using an euclidean metric. We denote with N e the number of shells 
included in a LR, defined according to such a metric. It is seen that E c converges rapidly as 
a function of N e , with both Af = 1 and 3. Already with N e = 2 (17 atoms belong to a LR) 
the results are very good, i.e. E c is higher than the result obtained with extended orbitals 
by only 2.1% and 0.8% for Af = 1 and 3, respectively. For AT — 1, the error on the total 
charge AiV (see Eq. 13) which gives the deviation from orthonormality due to localization 
constraints is in general very small; already for N e = 2 we find it to be 0.2%. We note that 
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when going from N e = 3 (29 atoms in a LR) to N e = 4 (35 atoms in a LR), we obtain the 
smallest variation of E c . Indeed the atoms added to a LR when including also the fourth 
neighbor shell are not connected by hopping terms to those defining a LR when N e = 3. 
This suggests that a definition of LR based on hopping terms is more physical than one 
based on the euclidean metric. We repeated the calculations with M = 1 by choosing the 
LRs according to the hopping parameters and by setting the number of hopping shells Nh 
at 3. (For the diamond lattice, the definition of LRs using the two metrics are different for 
N h and N e larger than two). The choice N h = 3 amounts to considering 41 atoms in a LR. 
The percentage error (0.7 %) on E c is very close to that obtained with N e = 5 (0.6 %), 
although the number of atoms in a given LR is bigger (47). The choice of the shape of the 
LRs according to the hopping parameters is superior to that of the euclidean metric and 
it is especially so when energy differences between different structures are to be computed. 
This is the definition which was adopted in all subsequent calculations for C. 

Results for carbon in different crystal structures are presented in Tables I and II and in 
Fig. 4. We chose systems with different bonding and electronic properties: a sp 3 bonded 
insulator, diamond, a sp 2 bonded semi-metal, planar graphite, and a sp bonded metal, 
a non dimerized C chain. Table I shows the binding energy of the three structures as 
a function of the size of the LR. The calculations were performed with E[Q(J\f = 1)]. 
The errors for N h — 2 and N h — 3 are of the same order of those found in the case of 
silicon, and in particular we find that already for N h — 2 the LO formulation and a direct 
diagonalization scheme are in good accord. In Fig. 4 we compare the total energy of the 
three C systems as a function of the lattice parameter, as obtained by direct diagonalization 
of the hamiltonian and by minimizing E[Q(J\f = 1)] with respect to LO, with Nh = 2. The 
agreement between the two calculations is again very good for the three systems, in spite 
of their different bonding and electronic properties. The percentage difference between the 
computed equilibrium properties (lattice constant, cohesive energy and bulk modulus) are 
given in Table II. 
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VI. MOLECULAR DYNAMICS WITH TB HAMILTONIANS 



By using the functional E[Q] and localized orbitals one can set up a MD scheme in 
which the computational cost of each step scales linearly with the system size. According 
to the Helmann-Feynman theorem, one can obtain the forces acting on a given atom / by 
computing F/ = —Vj-EfQ; {</>/,}, {R/}]; here (Rj) denotes ionic positions and {4>l} are 
the localized orbitals which minimize i?[Q]. The general expression of the ionic forces is 
given by Fj = —2YnJ 2 Qij < 0i| >j wnere ^ indicates the external potential in a 

LDA calculation and the hamiltonian in a TB calculation. In practical computations it is 
convenient to first calculate the auxiliary wavefunction 

N/2 

\Ti>=Y,Qij\<f>j> (is) 

3 

and then to evaluate the expression of Fj as follows: 

N/2 dv - 
f 7 = -25 < hlg^lb >• ( 16 ) 

The ground state wavefunctions {4>l} can be obtained either by evolving the electronic 
states according to a Car-Parrinello@ dynamics (see, e.g. Ref.H), or by minimizing the energy 
functional E[Q] at each ionic move. In our simulations, we determine the sets LRj at each 
ionic step; consequently the sites belonging to a set vary as a function of time, when, e.g. the 
atoms are diffusing or changing their local coordination. This implies an abrupt modification 
of the basis functions used for the expansion of {4>l} and therefore a discontinuity of {4>l} 
as a function of the ionic positions. In correspondence to any change of the sets LRj, E[Q>] 
must be minimized with respect to the electronic degrees of freedom; we therefore chose to 
minimize the energy functional at each ionic step, unrespective of whether the LR changes 
at a given step. The minimizations were performed with a conjugate gradient procedure 
where we used as initial guess for the orbitals the linear extrapolation of the minimized 
wavefunctions of the two previous ionic steps, as suggested in Refl. 

In order to test the accuracy and efficiency of the LO scheme for different classes of 
systems, we performed MD simulations for a crystalline insulator, i.e. diamond at low 



temperature, and for a liquid metal, i.e. liquid carbon at T ~ 5000/T. As for the calculations 
for C presented in the previous section, we adopted LRs centered on atoms, which include 
up to second shell of neighbors and whose shape is determined by the hopping parameters. 

We first discuss the case of crystalline diamond, when the sets LRi do not vary in time. 
We find that for diamond our MD scheme allows for a correct description of the total energy 
oscillations, around equilibrium, consistently with what previously obtainedH for Si. We 
performed two simulations, one with a 64 atom and the other with a 1000 atom supercell. 
In both cases we started from a ionic configuration with zero velocities, generated by giving 
a random displacement to the atoms up to .03 A with respect to their equilibrium positions. 
The integration time step (At) used in the simulations was 30 a.u. and the number of CG 
iterations per ionic move was 10. In Fig. 5 we show the potential energy (E) and the sum of 
the kinetic {Ek in ) and potential energy of the system as a function of the simulation time. 
It is seen that the same energy drift A(E + Eun)/Eun (0.1 in 0.5 ps) was found for the two 
simulations. This shows that the number of CG iterations to obtain a given accuracy in the 
energy conservation does not depend on the size of the system and that the overall scaling 
of the computational scheme is therefore linear. Finally we evaluated the relative error 
on the ionic forces Ft introduced by localization contraints as = ' 1 * — , where 

E/i F ~*i 

the overline indicates time averages, and the upperscripts loc and ext refer to calculations 
performed with localized and extended states, respectively. This error was found to be ~ 6% 
in crystalline diamond at room temperature. 

We note that if extended states are used, the number of iterations needed to have the 
same conservation of energy as the one reported in Fig. 5 is smaller than 10. Nevertheless our 
MD scheme applied to ordered systems becomes more efficient than direct diagonalization of 
the hamiltonian already for small systems, i.e. for systems containing more than 40 atoms. 
This can be seen in Fig. 6 where we compare the efficiency of our approach to that of direct 
diagonalization based MD schemes. 

We now analyze a MD simulation run during which the sets LRj change as a function of 
time. In Fig. 7 we show the potential energy for an oscillation of crystalline diamond around 
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equilibrium, computed with extended (E ext , dotted line) and with localized (E loc , solid line) 
orbitals as a function of simulation time (t). The two energies have been computed for 
the same ionic trajectories, generated by a simulation with localized orbitals. The MD run 
shown in Fig. 7 is the same as the one reported in Fig. 5 but now the LRs are allowed to vary 
in time. At t = tl, the evolution of the ionic positions makes the number of atoms belonging 
to given localization regions to increase. At t = t2, the ionic configuration is such to restore 
the localization regions as they were at t < tl. Since at t = tl, t2 an abrupt modification 
of the basis functions used for the expansion of {4>l} occurs, the potential energy E loc is 
discontinous and its derivative with respect to ionic positions is not well defined. However 
ionic forces can still be defined by neglecting the discontinuity in E and by evaluating either 
the left or the right derivatives of the potential energy. The numerical values of the left 
and right derivatives are in fact the same within a very small error. This error is negligible, 
being much smaller than the one introduced by localization contraints. This can be seen in 
Fig. 8 where we compare forces obtained in calculation with extended and localized orbitals 
by plotting dE ext /dt = £/ F| xt • v 7 (dotted line) and dE loc /dt = F^ oc • vj (solid line). On 
the scale of the picture no dicontinuity is observable in dE loc /dt at t = tl, t2. 

We now turn to the discussion of the simulation of liquid C, during which many changes 
of LRi were observed. We generated a diffusive state at T ~ 5000 K starting from a diamond 
network prepared at a macroscopic density of 2 grcm~ 3 ; we then heated the system by means 
of a Nose'-Hoover thermostat. We used a 64 atom cell with simple cubic periodic boundary 
conditions and only the T point to sample the BZ. We used a cutoff radius of 2.45 A for the 
hopping parameters entering the TB hamiltonian and for the two body repulsive potentiali! 
(i.e. the cutoff distances r m and d m of Ref. [22] are set at 2.45 A ). Equilibration of the 
system was performed in the canonical ensemble and temporal averages were taken over 
3.8 ps. The same simulation was repeated twice: once with our MD scheme and once by 
using direct diagonalization at each step. The radial distribution function g(r) and the 
partial atomic coordinations obtained in the two cases are shown in Fig. 9 and Table III, 
respectively. The agreement between the two descriptions is excellent, showing that the 
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LO scheme is accurate even for a difficult case such as a disordered system with differently 
coordinated atoms and metallic properties. The self-diffusion coefficients obtained in the two 
cases are 0.4 10~ 4 cm 2 sec _1 and 0.6 10~ 4 cm 2 sec -1 , respectively. The difference between the 
cohesive energies computed within the extended orbitals and the LO formulation for given 
ionic configurations is of the order of 2%, similarly to what found for crystalline structures. 

In the simulation for the liquid with LO, we used At = 5 a.u. and we performed 50 
iterations per ionic move, in order to minimize i?[Q]. This number is much larger than 
that needed for ordered systems, such as crystalline diamond. Consequently in the case of 
liquid C our scheme becomes advantageous with respect to direct diagonalization when the 
number of atoms is larger than 200. 

VII. CONCLUSIONS 

We have presented an approach to total energy minimizations and molecular dynamics 
simulations whose computational workload is linear as a function of the system size. This 
favourable scaling is obtained by using an energy functional whose minimization does not 
imply either explicit orthogonalization of the electronic orbitals or inversion of an overlap 
matrix, together with a localized orbital formulation. The use of LOs reduces the evaluation 
of the energy functional and of its functional derivative to the calculation of products of 
sparse matrices. 

The performances and efficiency of the method have been illustrated with several nu- 
merical examples for semiconducting and metallic systems. In particular we have presented 
molecular dynamics simulations for liquid carbon at 5000 K, showing that even for the case 
of a disordered metallic system the description provided by the LO formulation is reliable 
and very accurate. We have also shown that tight binding molecular dynamics simulations 
with 1000 atoms are easily feasible on small workstations, implying a one day run to obtain 
0.5 ps. Molecular dynamics simulations for very large C systems are underway. 



21 



VIII. ACKNOWLEDGEMENT 



It is a pleasure to thank R. Car, R. M. Martin and D. Vanderbilt for useful discussions 
and A. Possoz for her help in the optimization of the computer codes. We acknowledge 
support by the Swiss National Science Foundation under grant No. 21-31144.91. 



22 



REFERENCES 

1 For a review see, e.g., G. Galli and A. Pasquarello, in Computer Simulation in Chemical 
Physics, Edited by M.P.Allen and D.J.Tildesley, p. 261, Kluwer, Dordrecht (1993), and 
M. C. Payne, M. P. Teter, D. C. Allan, T. A. Arias and J. D. Joannopoulos, Rev. Mod. 
Phys. 64, 1045 (1993). 

2 G. Galli and M.Parrinello, Phys. Rev. Lett. 69, 3547 (1992). 

3 T. A. Arias, M. C. Payne and J. D. Joannopoulos, Phys. Rev. Lett. 69, 1077 (1992). 

4 R. Car and M. Parrinello, Phys. Rev. Lett. 55, 2471 (1985). 

5 W. Yang, Phys. Rev. Lett. 66, 1438 (1991). 

6 S. Baroni and P. Giannozzi, Europhys. Lett. 17, 547 (1991). 

7 W.-L. Wang and M. Teter, Phys. Rev. B 46, 12798 (1992). 

8 F. Mauri, G. Galli, and R. Car, Phys. Rev. B 47, 9973 (1993). 

9 X.-P. Li, R. Numes, and D. Vanderbilt, Phys. Rev. B 47, 10891 (1993). 

°M. S. Daw, Phys. Rev. B 47, 10895 (1993). 

1 A. D. Drabold and O. Sankey, Phys. Rev. Lett. 70, 3631 (1993). 

2 W. Kohn, Chem. Phys. Lett. 208, 167 (1993). 

3 P. Ordejon, D. Drabold, M. Grunbach, and R. Martin, Phys. Rev. B 48, 14646 (1993). 
4 M. Aoki, Phys. Rev. Lett. 71, 3842 (1993). 
5 P. L6wdin, J. Chem. Phys. 18, 365 (1950). 

6 F. Tassone, F. Mauri, and R. Car, Phys. Rev. B submitted (1993). 
7 G. Pastore, E. Smargiassi, and F. Buda, Phys. Rev. A 44, 6334 (1991). 
8 R. McWeeny, Rev. Mod. Phys. 32, 335 (1960). 

23 



G. Bachelet, D. Hamann, and M. Schliiter, Phys. Rev. B 26, 4199 (1982). 

L. Kleinman and D. M. Bylander, Phys. Rev. Lett. 48, 1425 (1982). 

L. Goodwin, A. Skinner, and D. Pettifor, Europhys. Lett. 9, 701 (1989). 

C. Xu, C. Wang, C. Chan, and K. Ho, J.Phys.: Condens. Matter 4, 6047 (1992). 



24 



FIGURES 



Fig. 1 Total energy (E) as a function of the number of iterations for a steepest descent 
minimization of 64 Si atoms in the diamond structure, described within LDA with a PW 
basis set. The solid and dotted lines correspond to the minimization of E[Q] and E 1 - (see 
text), respectively. Q was defined with M — 1. We used kinetic energy cutoffs of 12 and 
36 Ry for the wavefunctions and charge density, respectively and we set r\ at 3 Ry above 
the top of the valence band. Each run was started from the same set of random Fourier 
coefficients. 

Fig. 2 Total electronic charge as a function of the number of iterations for the energy 
minimizations reported in Fig. 1. The total number of electrons in the system is 256. 

Fig. 3 Percentage error on the cohesive energy of Si (diamond structure) as a function 
of the number of shells (iV e ) in the localization region, computed with a TB hamiltonian 
(see text). Diamonds and crosses refer to minimizations of E[Q] with Q[Af = 1] and 
Q[jV = 3], respectively. The LRs were defined using an euclidean metric (see text). The 
errors were evaluated with respect to a computation with extended orbitals. Calculations 
were performed at the same fixed volume. 

Fig. 4 Total energy (E) of diamond (dots), bidimensional graphite (open circles) and a 
carbon linear chain (squares) as a function of interatomic distance id), computed with a TB 
hamiltonian and supercells containing 216, 128 and 100 atoms, respectively. The dotted lines 
were obtained by minimizing at each volume E[Q] with Q[Af = 1], and by using localization 
regions defined with N h = 2. The solid lines were instead obtained by diagonalizing the 
hamiltonian, with no constraints on the wavefunctions. 
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Fig. 5 Potential energy (lower part) and the sum of the potential and kinetic energy 
(upper part) as a function of simulation time for crystalline C in the diamond structure at 
70 K. The dotted and solid lines refer to two calculations performed with a TB hamiltonian, 
with 64 atom and 1000 atom supercells, respectively. In both cases we used Q[jV = 1] and 
N h = 2; the LRs were computed for the configuration at K and held fixed during the whole 
simulation. 

Fig. 6 CPU time per ionic step (30 a.u.) as a function of the number N of atoms in 
the system, for a TB-MD simulation of C diamond at low temperature (see text). Squares 
and crosses indicate the CPU time in a direct diagonalization based scheme and in our MD 
approach (with 10 CG iterations per ionic move), respectively. Calculations were carried 
out on a Silicon Graphics Iris Indigo 4000. 

Fig. 7 Potential energy for an oscillation of crystalline diamond around equilibrium, 
computed with extended (dotted line) and with localized (solid line) orbitals as a function of 
simulation time. The two energy curves have been computed for the same ionic trajectories, 
generated by a simulation with localized orbitals. The LO calculation is the same as the 
one carried out in Fig. 5, but here the LRs are allowed to vary during the simulation, tl 
and t2 denotes times at which the LRs change. 

Fig. 8 Time derivatives dE cxt /dt = E/ Ff t • v 7 (dotted line) and dE loc /dt = £/ F^ oc • v 7 
(solid line) of the potential energy curves reported in Fig. 7 (see text). 

Fig. 9 Radial distribution function g(r) of liquid C (see text) computed as average over 
a TB-MD simulation of 3.8 ps. The results of the LO formulation with N h — 2 and M = 1 
(dotted line) are compared to those of a direct diagonalization scheme (solid line). The 
average number of atoms in a LR is 18. 



26 



TABLES 



Crystal structure 




E c [N h = 2] 


E c [N h = 3] 


E c [N h = oo] 


Diamond 


1.54 


7.16 


7.23 


7.26 


2D-graphite 


1.42 


7.09 


7.19 


7.28 


ID-chain 


1.25 


5.62 


5.75 


5.93 



TABLE I. Cohesive energy E c (eV) of different forms of solid carbon computed at a given lattice 
constant ro (A) as a function of the number of shells (Nh) included in the LR. The calculations 
were performed with a TB hamiltonian, with supercells containing 216, 128 and 100 atoms for 
diamond, two-dimensional graphite and the linear chain, respectively. 
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Crystal structure 


5ro (%) 


SE r (%) 


<55 (%) 


Diamond 


0.2 


1.4 


1.0 


2D-graphite 


0.4 


2.5 


1.4 


ID-chain 


0.5 


4.7 


2.7 



TABLE II. Percentage errors on the equilibrium lattice parameters (<5ro), cohesive energy 
(5E C ) and bulk modulus (5B) of diamond, graphite and a carbon linear chain, as obtained by 
minimizing E[Q] with Q[M = 1] and Nh = 2, described within a TB framework. The errors were 
evaluated with respect to a computation with extended orbitals. 





N h = 2 


N h = oo 


1-fold 


5 


4 


2-fold 


38 


42 


3-fold 


53 


50 


4-fold 


4 


4 



TABLE III. Percentage number of differently coordinated sites (N c ) in liquid C computed as 
averages over a TB-MD simulation of 3.8 ps. The results of the LO formulation with Nh = 2 and 
M = 1 are compared to those of a direct diagonalization scheme (N^ = oo). 
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